A Visual Signal Reliability for Robust Audio-Visual Speaker Identification
نویسندگان
چکیده
منابع مشابه
Audio-Visual Correlation Modeling for Speaker Identification and Synthesis
This thesis addresses two major problems of multimodal signal processing using audiovisual correlation modeling: speaker recognition and speaker synthesis. We address the first problem, i.e., the audiovisual speaker recognition problem within an open-set identification framework, where audio (speech) and lip texture (intensity) modalities are fused employing a combination of early and late inte...
متن کاملAudio-visual speaker recognition using time-varying stream reliability prediction
We examine a time-varying, context dependent information fusion methodology for multi-stream authentication based on audio and video data collected simultaneously during a user’s interaction with a system. Scores obtained from the two data streams are combined based on the relative local richness, as compared to the training data or derived model, and stability of each stream. The results show ...
متن کاملDynamic visual features for audio-visual speaker verification
The cascading appearance-based (CAB) feature extraction technique has established itself as the state of the art in extracting dynamic visual speech features for speech recognition. In this paper, we will focus on investigating the effectiveness of this technique for the related speaker verification application. By investigating the speaker verification ability of each stage of the cascade we w...
متن کاملSpeaker adaptation for audio-visual speech recognition
In this paper, speaker adaptation is investigated for audiovisual automatic speech recognition (ASR) using the multistream hidden Markov model (HMM). First, audio-only and visual-only HMM parameters are adapted by combining maximum a posteriori and maximum likelihood linear regression adaptation. Subsequently, the audio-visual HMM stream exponents are adapted to better capture the reliability o...
متن کاملAudio-Visual Clustering for Multiple Speaker Localization
We address the issue of identifying and localizing individuals in a scene that contains several people engaged in conversation. We use a human-like configuration of sensors (binaural and binocular) to gather both auditory and visual observations. We show that the identification and localization problem can be recast as the task of clustering the audio-visual observations into coherent groups. W...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEICE Transactions on Information and Systems
سال: 2011
ISSN: 0916-8532,1745-1361
DOI: 10.1587/transinf.e94.d.2052